HPAVC

home *** CD-ROM | disk | FTP | other *** search

/ HPAVC / HPAVC CD-ROM.iso / ARASAN_S.ZIP / README < prev next >

Wrap

Text File | 1994-08-15 | 29KB | 583 lines

Arasan - Chess for Windows - version 1.2 Programmer's Guide by Jon Dart Arasan is a chess program for Windows 3.1. It is copyrighted but under terms that allow free distribution for non-commercial purposes. This archive contains the source code for Arasan. I am providing the source code for the use of programmers who may wish to understand how the program works or to modify it for their own use. However, you may not distribute a modified version of this program without the consent of the original author. Arasan is written in C++ and was compiled with the Borland C++ compiler, version 4.0 (the source can also be compiled under Borland C++ version 3.1). Arasan uses the Windows++ class library by Paul DiLascia. Source for Windows++ is not included in this distribution. It is available for $25 from: Paul DiLascia P.O. Box 391632 Cambridge, MA 02139 Note: I have no connection with Paul DiLascia. I am just a user of his library. In a future version of Arasan I hope to convert to using a different, more widely available class library such as MFC or OWL. I have made a few bug fixes in the Windows++ source code; see the file WPP.DIF for details of what was changed. The remainder of this file contains information for use by programmers reading or working on Arasan source code. I assume that you have a working familiarity with C++, Windows programming, and the Borland compiler and tools. Building Arasan I have attempted to make the source code for Arasan as portable as possible, but I don't guarantee that it can be built with any compiler except Borland C++. In particular, since it uses templates, it cannot be compiled with Microsoft Visual C++ 1.0 or 1.5. To compile Arasan, first edit the "make.cfg" file so that BCCDIR points to your installation of Borland C++, and WPPDIR points to your installation of Windows++. If you are using Borland C++ 3.1, you will need to remove the -x- switch from the CFLAGS variable defined in this file. Then just type "make" from the source directory. Make will compile and link the executable (arasan.exe). I recommend that you have a minimum of 4 Megabytes of memory in order to build the source. If you are compiling in a DOS box and run out of memory, try exiting Windows and running "make" from DOS. Opening book format The opening book for Arasan is composed of two binary files, one for the White moves ("BOOKW") and one for the black moves ("BOOKB"). At build time, these files are bound into the executable by the resource compiler. The format of the binary files is documented in makebook.cpp. Since BOOKB and BOOKW are supplied with the source, you don't need to create these files in order to build Arasan. However, if you want to modify the contents of the opening book, you will need to edit the ASCII source for the openings and rebuild BOOKB and BOOKW. Following is information on how to do this. The ASCII source for the opening book is contained in the file BOOK.TXT. Moves are listed one or more per line in standard algebraic notation. A move may optionally be followed by a number from 0 to 9. The number, if given, is the "weight" assigned to the move. A weight of 0 means that the computer will never play the move, but will respond to it if it is played by its opponent. However, if a weight of 0 is specified after a non-zero weight for the same move in the same position has already been specified, the original weight is unchanged. The higher the weight, the more often the computer will choose the move compared to the other alternatives in the book. By default, if no weight is specified, a weight of 5 is assigned. The supplied book uses weights rather sparingly. In general, the computer has been steered away from gambits, and away from openings it doesn't play well. It is also biased towards playing the main line of an opening as opposed to less common variants. Note: I am not a strong player, and so the opening book is probably far from optimal. It is also not very extensive: currently it contains aroud 10000 half-moves, mostly from recent grandmaster games. Suggestions for expansion and/or improvement of the opening book are quite welcome. Some special symbols are allowed in the opening book. Any line beginning with ";" is treated as a comment and ignored. An "m" in the first column means that the present position should be remembered. An "r" in the first column restores the position to where it was when the last "m" was encountered. This allows easy entry of variant lines from the same starting position. Only one remembered position can be stored and retrieved, however: e.g. if you specify "m" twice, the first position stored is discarded. Building the opening book To build the binary opening files from BOOK.TXT, you first need to build the program MAKEBOOK.EXE. To do this, type "make -f makebook.mak". MAKEBOOK is a DOS program that uses some of the same files as the chess program. Since it must be built for DOS and not Windows, make sure you remove any old object files before building it, so that you don't get Windows objects mixed in with the DOS objects. Once MAKEBOOK has been built, just execute it and it will read BOOK.TXT and produce BOOKB and BOOKW. Error messages are issued if any moves are illegal or if there are syntax errors in the input file. Testing support Arasan includes several features to aid debugging. If you compile the source with -DRANGE_CHECK, checks are inserted for accessing arrays past their boundaries, as well as some other sanity checks. If any of these checks fail, an error box will be displayed with the assertion that failed. For debugging purposes, it is also possible to build a DOS executable that contains the Arasan search engine, but no user interface. This executable is called "TESTSRC". To build TESTSRC, do "make -f testsrc.mak". Since testsrc is a DOS executable but uses many of the same files as the Windows executable, be sure to delete any old object files before building TESTSRC. "make clean" will do this. TESTSRC takes a command line that consists of one of two optional switches followed by an optional filename. The two allowable switches are: -p <number> - performs a fixed-ply search to the indicated depth. -t <number> - searches for the given number of seconds. The filename, if specified, contains one or more board positions to be analyzed, one per line. The position info is in EPD (Extended Position Description) format. TESTSRC currently only recognizes a small subset of EPD, sufficient to run the tests. If the search module is compiled with -DTRACE, TESTSRC will print out copious information about the search process when it is run. See search.cpp to see what information is output and where it comes from. Another compile-time debugging option is -DDEBUG_ATTACKS, which will insert test code to check that the attack information for the board is updated correctly when a move is made. Because TESTSRC is a DOS program, it can easily be run from a batch file to execute a series of tests. The following test suites are provided with the source code (see the TESTS directory): 1. TESTS.EPD contains 22 fairly simple tactical problems. Arasan should solve all these problems with a nominal 3-ply search. 2. WAC.EPD contains the 300 problems from Reinfeld's "Win At Chess" book. These are mostly easy tactical problems. Arasan solves about 2/3 of them with a 60-second search limit on a Pentium 60. 3. The file BK.EPD contain the Bratko-Kopec series of tests. It is standard procedure to allow the program 2 minutes for each test. Some of these problems are quite difficult. Arasan currently solves 11 out of the 24 problems when run on a 60MHz Pentium. 4. The file ECE3100.EPD contains the first 100 problems from Encyclopedia of Chess Endgames, volume 3. These are rook endgames. 5. Another endgame test suite is FINE.EPD, containing 48 king and pawn endgame problems from Reuben Fine's Basic Chess Endings. 6. TYPP.EPD contains tests from Bellin and Ponzetto's book Test Your Positional Play. 7. ECM.EPD contains test positions from the Encyclopedia of Chess Middlegames. These are difficult middlegame problems. The RESULTS file in the TESTS directory summarizes Arasan's performance on these test suites. Details of the test runs can be found in the log files, e.g. TESTS.LOG, BK.LOG, etc., which are also in the TESTS directory. Algorithms and data structures 1. The chess board Following is some information about the algorithms and data structures used by Arasan. If you are new to computer chess programming, I suggest first reading a general work on the subject such as Frey (1983) or Marsland and Schaeffer (1990). The chess board in Arasan is represented by an array of 64 squares. Each square contains 0 if it is empty, or a piece identifier if it is occupied. Black pieces have identifier values between 1 and 6, while White pieces have values between 9 and 15. (Note: the program used to use a variant of the "mailbox" representation described by Frey, with 120 squares. This allows easy detection of when a piece has reached the board boundary. However, because stack space is a limited resource in Windows, the representation was changed to use only 64 squares). A special value (127) is used to represent a square that is uninitialized or invalid. The Board class also maintains a parallel board representation called "PiecePos" consisting of two arrays of 16 entries each. These contain lists of the squares occupied by pieces of each side. Any array entry unoccupied by a piece is set to 127. Several other pieces of information are stored in the Board class and are updated for each move. The KingPos array holds the location of each side's king. The PFileCount array holds a count of the number of pawns on each file, for each side, and the RFileCount array holds a similar value for rooks. The EnPassantSq array holds the square position, for each side, on which an en passant capture is possible (obviously, only the side to move can have a possible en passant capture, but we maintain two values for programming convenience). The CastleStatus array holds an enum for each side indicating whether castling has occurred. Also, if the king or a rook has been moved, making castling on one side or another impossible, CastleStatus is set to an appropriate value. Each board position also has a hash code associated with it. The hash code is 32 bits and is computed by fetching, for each piece and square combination, a unique 32-bit code from a table of random numbers, and computing the exclusive or of these codes. The low-order bit of the hash code is then set to identify whether White or Black is to move. Castling status and en passant status are not folded into the hash code, but are stored in a separate field in a hash table entry - they must match the current board in order for a position to be retrieved from the table. Finally, the board position includes an array, for each side, of type Attacks. This contains an integer for each square indicating which pieces and of what kinds attack the square. This information is incrementally updated (by the Attacks class) when a move is made or unmade. Attacks does not store information about "stacked" attackers, which may make some calculations involving this information inaccurate. An earlier chess program I wrote re-calculated complete attack information for each position that was evaluated, but it wound up spending a large fraction of its time in the attack calculation procedure. Incrementally updating the attack information is relatively cheap and probably faster. 2. Moves There are three move classes used in Arasan. "Move" maintains only the minimal information need to unambiguously specify a move: start square, destination square, and promotion value. "ExtendedMove" contains also the piece being moved, the piece being captured (if any), and the type of move (normal, castling, en passant, etc). Finally, "ReversibleMove" contains in addition the castling and en passant status of the board before the move was made. Since this information is needed for restoring the board position, only a ReversibleMove can be "undone". 3. Move Generator The move generation logic is mostly contained in the classes Bearing, Move_Generator and Move_Ordering. Bearing contains static functions to compute squares to which a piece can move. Except for pawn moves, and for castling, move generation is done by table lookup. Each class of piece has an associated table. Each table is indexed by square number and side to move, and contains a list of squares to which the piece could move (these tables are in beardata.h). Additional checks are made to see that the destination square doesn't contain a piece of the same color. However, moves into check are possibly included. The Move_Generation class uses functions from Bearing to generate all moves for a given side. Moves into check are included, unless the side to move is in check. In that case, a special function (Check_Evasions) is called that strictly checks moves for equality. It is very important to know whether any legal moves are possible when in check: if there are none, the side to move is checkmated. Also, some search extensions depend on the existence of a forced move (one single legal move). Move_Generation also calls the Move_Ordering class to re-order the moves. At ply 0, some rather elaborate criteria are used for move ordering: this includes a computation of each moves' positional score, and also (for captures), a call to function attack_estimate, which uses the attack information for the destination square to estimate the gain from a capture. At ply 0, all moves are scored and sorted. At higher plies, a less elaborate algorithm is used, which moves the first few most promising moves to the start of the move array, and then sorts only those. Captures are given a high priority in the search order - higher if the capture appears to gain material, but fairly high even if the capture apparently loses. Promotions are also given special treatment. Arasan also uses the "killer" heuristic. Moves that cause beta cutoff in the search are stored in a killer array, and the move ordering routine will give these moves a higher score than other moves. But captures are generally given a higher score than "killer" moves, so the killer moves have a relatively small effect on the overall move ordering. The scoring algorithms in Move_Ordering have been tuned to maximize search speed, mostly on the tactical problems in the test suite, but there is doubtless more room for improvement. 4. Searching Arasan uses an alpha-beta search algorithm with a variety of search extensions. The search class is the largest single module in the program, and is necessarily rather complicated, but I have tried to structure it and comment it so that it is understandable. I will assume that the reader knows the basics of the alpha-beta algorithm, and will concentrate on describing this implementation of it. In general, the search routine tries to terminate a search tree, or some portion of one, as soon as possible, and will defer as much work as possible until it is certain that no early and quicker termination can be done. The techniques for doing this are mostly well-known and there is nothing very original about the search algorithms used by Arasan. However, as with most chess programs, there is a fine balance between terminating a search too soon and extending it into unprofitable and very unlikely lines of play. The precise nature of this balance depends not only on the search algorithms used, but also the relative efficiency of operations such as move generation, position evaluation and move ordering. Each program therefore strikes this balance in a somewhat different way. The entry point for a search is a routine called find_best_move. This function does some initialization, and then calls move_search, which implements the alpha-beta search algorithm. The search proceeds one ply (half move, i.e. move by one side) at a time. That is, first a one-ply search is done, then a two-ply search, then three, etc. until either the maximum ply limit has been reached or the time control has been exceeded. Each search uses the results of the preceeding search. The variable "max_depth" holds the current nominal ply depth for the search. However, the presence of search extensions means that some nodes may be searched to a greater depth than this. The first step in move_search is to look in the hash table (further described in the next section), in order to see if an identical position has been visited before. This may happen due to a transposition of moves that lead to the same position, or because a previous search to a shallower depth visited the same node. If a hash table entry is found and if it contains a valid value (i.e. one that did not cause cutoff), then that value is returned immediately and no further searching from that node occurs. In other cases, the hash table may not contain an exact value, but may hold an upper or lower bound that can be used to narrow the alpha-beta window. After the hash table check, a further check is made to see if the current board position is drawn, due to insufficient material or to a 3-fold repetition of moves. Arasan does not currently check for draws due to the 50-move rule or variants of it. If the position is drawn, move_search returns the negative of the evaluation function. This penalizes the program for allowing a draw when it is ahead in material or has a superior position. If the search has reached the nominal search depth plus the maximum possible search extension depth, or exceeded the maximum supported ply depth, it also terminates immediately. Normally the initial score for the position is set to -Constants::BIG (BIG is a large integer used several places in the search process). However, a different approach is taken when pursuing search extensions, i.e. portions of the search tree beyond the nominal search depth (max_depth). Three different types of search extension are used in Arasan: a. If the side to move is not in check, and if the search exceeds the maximum depth, searching continues but includes only promotions and capture moves that appear to gain material. There is a limit, set in the Extensions structure, to how many additional plies may be searched. b. If the side to move is in check, the search is also extended, and includes all legal replies, again up to the limit set in Extensions. c. Finally, forced moves (moves with only one possible reply) cause the search to be extended one ply and this ply is not counted towards the computation of any other search limit. In case a., which in the source code is called a "capture search," the current position is evaluated and searching may terminate if the resulting value causes beta cutoff. The reasoning is that there is no point continuing the search if the side to move can "stand pat" without making any furture captures, and still obtain a value high enough for cutoff. However, this early cutoff is inhibited if the side to move has an apparently trapped piece, or appears to have a "hung" piece (a piece subject to profitable capture). If early cutoff doesn't occur, the next step is to try the "null move.". The side to move is changed without altering the board position and the opposing side is then allowed to move. Of course, this could not occur in a game - a player is not allowed to "pass," but must move. However, the theory is that if the null move causes cutoff, then the side to move must have a good position, since in effect giving the opponent a free move still produces a high value for the side to move. In this case, beta cutoff is allowed to occur and no more searching is done from this node. This search enhancement is enabled by compiling with -DNULL_MOVE. If the null move does not cause cutoff, then the principal variation move is searched. In the case of an initial search (e.g. a one-ply search), the principal variation move is just the first move returned by the move generator. Otherwise, at ply 0, it is the highest-scoring move from the previous search iteration. This is obtained by sorting the 0-ply moves from the last iteration, all of which were stored along with their scores. At deeper plies, the hash table is queried and if a best move has been stored for the position, that move is tried first and is considered the principal variation. The search first calls skip_move to see if the move should be skipped. skip_move enforces the search extension limits mentioned above. If the move is not to be skipped, skip_move returns "False", and the search proceeds to call try_move. try_move will first make the move, then query the attack info for the board to see if the side to move is in check (remember, the move generator typically does not exclude moves into check). If a move into check is found, the special value Illegal is returned and the next move is tried. If the move passes the legality check, then move_search is called again (it is thus indirectly recursive). If the return value from move_search did not cause cutoff, then the search must be peformed again with a wider search window. try_move takes care of this. try_move also checks the timer (if a timed search is being done) and determines when to terminate the search. Usually this occurs when 95% of the allotted time has been used, but in some cases the computer will be allowed to search a slightly longer time. Once try_move returns a value for the node, it is compared against alpha and beta. If the value exceeds alpha, then a new best move has been found at this node and the move is stored. If the value exceeds beta, cutoff occurs and no more searching is done. A special check is also made to see if the value indicates that mate has occurred, since there is no point searching any further after that. Assuming the principal variation move does not cause cutoff or mate, then move_search proceeds to search the remainder of the moves. These are searched with a minimal alpha-beta window. Note that since the principal variation move is usually obtained from the hash table or the ply0moves array, it may be the case that the move generator has never had to be called before now. If so, we call it at this time. An earlier version of Arasan tried to optimize things further by only generating part of the moves at a time. That way, if cutoff occurred, the remainder of the move generation could be skipped. However, there was a cost to this in terms of complexity and it did not seem to help search speed much. The final part of move_search checks to see if checkmate or stalemate occurred, and updates the hash table. When the top-level invocation of move_search terminates, it returns to find_best_move, which determines the time used, and updates the "Statistics" structure with the time and other information about the search. 5. The hash table The search routine uses a hash table for storing the results of evaluating previously visited positions. This table is implemented using a template class (Hash) because the opening book generator uses a similar hash table but stores somewhat different information in the table. The hash table is basically an array of lists (class S_List). Each list contains a series of nodes (S_Node), each of which contains some data (in the case of the search engine, a class of type Position_Info) and a pointer to the next node. Each list holds entries that hash, modulo the hash table size, to the same value. Each node contains the whole hash code, so that finding a given node to match a given hash code consists of indexing into the hash table, then following the list until the hash codes match. Besides the hash code, each hash entry also contains the score for the node, a set of flags indicating whether the value is exact, an upper bound or a lower bound, the depth of search used to evaluate the node, a word holding the castling status and en passant square, and the best move for the position. Under Windows, the number of lists in the hash table is configured at runtime based on the amount of memory available. Under DOS, the number of lists is fixed in size. In both cases, there is also a limit on how many nodes can be entered into the table. The hash table is cleared after each search. Memory allocation and deallocation for the hash table can be quite expensive, especially under Windows. To minimize this, operators new and delete are overloaded for both S_List and S_Node classes. These operators use a simple memory allocation scheme that obtains memory from the OS in large chunks and does suballocation out of the chunks (class Pool implements this). Memory freed via "delete" goes into a free list and is immediately available for allocation again via "new". Memory is not actually freed until the "freeAll" method is called, which occurs only when the hash table is cleared. Then the large memory chunks are returned to the OS. 6. Position Scoring There are six main components to the positional score used by Arasan: a. Center control b. Development c. Castling d. Pawn structure e. King safety f. Threats The positional score is typically within the range of plus or minus the value of a pawn (64), but can be greater in some circumstances. When in the endgame (determined by the amount of material on the board) simplified and/or different scoring parameters are used. Also, special scoring code is used for "mopping up" positions, in which one side has a large material advantage. Chess 4.5 from Northwestern University used a similar scheme. Center control is mostly calculated by table lookup. For sliding pieces, a check is made to be sure that the piece has an unobstructed line of movement to the center of the board. This component of the score is not used in the endgame. The development score encourages the program to move its pieces from the back rank, but discourages premature development of the queen. A measure of piece mobility is also calculated for bishops and rooks. Bonuses are awarded for a rook on the 7th rank, and also for a rook on an open or half-open file, and for doubled rooks. In the endgame, the development score encourages the king to move towards the center or opposite side of the board, and to stay near pawns. A bonus is also awarded for having the opposition. This code is not very effective in producing good play, but it is better than nothing. Special-case code is included for KPK and KNBK endgames, which enables the program to play these fairly well. When "mopping up", the development score gives a bonus for restricting the opposing king's mobility and for driving the opposing king to an edge or corner of the board. The castling score penalizes the program for being unable to castle, either temporarily (because a square traversed by castling is under attack) or permanently (because the king or rook has made another move). The pawn structure score penalizes isolated, backward, and doubled pawns, and gives a bonus for passed pawns. If a passed pawn exists, its value depends on its rank and also on whether squares ahead of it are occupied or controlled by the opposing side. The king safety score evaluates the extent to which the king is protected by pawns, and also the extent to which opposing pieces attack squares near the king. It is pretty inexact, but does at least alert the side to move when the king is in serious trouble. The threat score penalizes the side to move for having pieces that appear to be trapped or to be subject to profitable capture ("en prise"). A pinned piece is given the same penalty as a trapped piece. Usually the search routine will extend the search when such situations exist, but we have to do something when the terminal search depth is reached. Contacting the Author I have been working on computer chess programming for around seven years, and this is the second complete chess program I have written. It is still very much imperfect - but I have decided to release it in its present form, both so that others can enjoy playing with it and so that programmers can study it, learn from it, and maybe improve it. I don't guarantee any support for this program. However, if you do find bugs in it, or discover a way to improve it, I would like to hear from you. I can be reached at: Jon Dart 553 N. 17th St. San Jose, CA 95112 Or via Internet at jdart@rational.com. References Frey, Peter W. (ed.) (1983). Chess Skill in Man and Machine. New York: Springer-Verlag. Marsand, T. Anthony and Schaeffer, Jonathan (1990). Computers, Chess and Cognition. New York: Springer-Verlag.